expert network
iFlyBot-VLA Technical Report
Zhang, Yuan, Xue, Chenyu, Xu, Wenjie, Ji, Chao, wu, Jiajia, Pan, Jia
We introduce iFlyBot-VLA, a large-scale Vision-Language-Action (VLA) model trained under a novel framework. The main contributions are listed as follows: (1) a latent action model thoroughly trained on large-scale human and robotic manipulation videos; (2) a dual-level action representation framework that jointly supervises both the Vision-Language Model (VLM) and the action expert during training; (3) a mixed training strategy that combines robot trajectory data with general QA and spatial QA datasets, effectively enhancing the 3D perceptual and reasoning capabilities of the VLM backbone. Specifically, the VLM is trained to predict two complementary forms of actions: latent actions, derived from our latent action model pretrained on cross-embodiment manipulation data, which capture implicit high-level intentions; and structured discrete action tokens, obtained through frequency-domain transformations of continuous control signals, which encode explicit low-level dynamics. This dual supervision aligns the representation spaces of language, vision, and action, enabling the VLM to directly contribute to action generation. Experimental results on the LIBERO Franka benchmark demonstrate the superiority of our frame-work, while real-world evaluations further show that iFlyBot-VLA achieves competitive success rates across diverse and challenging manipulation tasks. Furthermore, we plan to open-source a portion of our self-constructed dataset to support future research in the community
SpotDiff: Spotting and Disentangling Interference in Feature Space for Subject-Preserving Image Generation
Li, Yongzhi, Zhang, Saining, Chen, Yibing, Li, Boying, Zhang, Yanxin, Du, Xiaoyu
Personalized image generation aims to faithfully preserve a reference subject's identity while adapting to diverse text prompts. Existing optimization-based methods ensure high fidelity but are computationally expensive, while learning-based approaches offer efficiency at the cost of entangled representations influenced by nuisance factors. We introduce SpotDiff, a novel learning-based method that extracts subject-specific features by spotting and disentangling interference. Leveraging a pre-trained CLIP image encoder and specialized expert networks for pose and background, SpotDiff isolates subject identity through orthogonality constraints in the feature space. To enable principled training, we introduce SpotDiff10k, a curated dataset with consistent pose and background variations. Experiments demonstrate that SpotDiff achieves more robust subject preservation and controllable editing than prior methods, while attaining competitive performance with only 10k training samples.
Distribution-aware Forgetting Compensation for Exemplar-Free Lifelong Person Re-identification
Liu, Shiben, Fan, Huijie, Wang, Qiang, Fan, Baojie, Tang, Yandong, Qu, Liangqiong
Lifelong Person Re-identification (LReID) suffers from a key challenge in preserving old knowledge while adapting to new information. The existing solutions include rehearsal-based and rehearsal-free methods to address this challenge. Rehearsal-based approaches rely on knowledge distillation, continuously accumulating forgetting during the distillation process. Rehearsal-free methods insufficiently learn the distribution of each domain, leading to forgetfulness over time. To solve these issues, we propose a novel Distribution-aware Forgetting Compensation (DAFC) model that explores cross-domain shared representation learning and domain-specific distribution integration without using old exemplars or knowledge distillation. We propose a Text-driven Prompt Aggregation (TPA) that utilizes text features to enrich prompt elements and guide the prompt model to learn fine-grained representations for each instance. This can enhance the differentiation of identity information and establish the foundation for domain distribution awareness. Then, Distribution-based Awareness and Integration (DAI) is designed to capture each domain-specific distribution by a dedicated expert network and adaptively consolidate them into a shared region in high-dimensional space. In this manner, DAI can consolidate and enhance cross-domain shared representation learning while alleviating catastrophic forgetting. Furthermore, we develop a Knowledge Consolidation Mechanism (KCM) that comprises instance-level discrimination and cross-domain consistency alignment strategies to facilitate model adaptive learning of new knowledge from the current domain and promote knowledge consolidation learning between acquired domain-specific distributions, respectively. Experimental results show that our DAFC outperforms state-of-the-art methods. Our code is available at https://github.com/LiuShiBen/DAFC.
- Information Technology > Security & Privacy (0.46)
- Education (0.46)
- Social Sector (0.34)
Physics-informed mixture of experts network for interpretable battery degradation trajectory computation amid second-life complexities
Huang, Xinghao, Tao, Shengyu, Liang, Chen, Chen, Jiawei, Shi, Junzhe, Li, Yuqi, Xia, Bizhong, Zhou, Guangmin, Zhang, Xuan
Retired electric vehicle batteries offer immense potential to support low-carbon energy systems, but uncertainties in their degradation behavior and data inaccessibilities under second-life use pose major barriers to safe and scalable deployment. This work proposes a Physics-Informed Mixt ure of Experts (PIMOE) network that computes battery degradation trajectories using partial, field-accessible signals in a single cycle. PIMOE leverages an adaptive multi-degradation prediction module to classify degradation mode s using expert weight synthe sis underpinned by capacity-voltage and relaxation data, producing latent degradation trend embeddings. These are input to a use-dependent recurrent network for long-term trajectory prediction. V a lidated on 207 batteries across 77 use conditions and 67,902 cycles, PIMOE achieves an average mean absolute percentage (MAPE) errors of 0.88% with a 0.43 ms inference time. Compared to the state-of-the-art Informer and PatchTST, it reduces computational time and MAPE by 50%, respectively. Compatible with random state of charge region sampling, PIMOE suppor ts 150-cycle forecasts with 1.50% average and 6.26% maximum MAPE, and operates effe ctively even with pruned 5M B training data. Broadly, PIMOE framework offers a deployable, history-free solu tion for battery degradation trajectory computation, redefining how second-life energy storage systems are asse ssed, optimized, and integrated into the sustainable energy landscape.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Energy > Renewable (1.00)
- Energy > Energy Storage (1.00)
- Transportation > Ground > Road (0.88)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation
Wang, Yiming, Peng, Hao, Wang, Senzhang, Du, Haohua, Liu, Chunyang, Wu, Jia, Wu, Guanlin
Traffic data imputation is fundamentally important to support various applications in intelligent transportation systems such as traffic flow prediction. However, existing time-to-space sequential methods often fail to effectively extract features in block-wise missing data scenarios. Meanwhile, the static graph structure for spatial feature propagation significantly constrains the models flexibility in handling the distribution shift issue for the nonstationary traffic data. To address these issues, this paper proposes a SpatioTemporal Attention Mixture of experts network named STAMImputer for traffic data imputation. Specifically, we introduce a Mixture of Experts (MoE) framework to capture latent spatio-temporal features and their influence weights, effectively imputing block missing. A novel Low-rank guided Sampling Graph ATtention (LrSGAT) mechanism is designed to dynamically balance the local and global correlations across road networks. The sampled attention vectors are utilized to generate dynamic graphs that capture real-time spatial correlations. Extensive experiments are conducted on four traffic datasets for evaluation. The result shows STAMImputer achieves significantly performance improvement compared with existing SOTA approaches. Our codes are available at https://github.com/RingBDStack/STAMImupter.
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Hunan Province > Changsha (0.04)
- Asia > China > Hebei Province (0.04)
- Asia > China > Guangdong Province > Shantou (0.04)
- Transportation > Infrastructure & Services (0.55)
- Consumer Products & Services > Travel (0.34)
- Information Technology > Data Science > Data Quality (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks
Mixture-of-experts networks (MoEs) have demonstrated remarkable efficiency in modern deep learning. Despite their empirical success, the theoretical foundations underlying their ability to model complex tasks remain poorly understood. In this work, we conduct a systematic study of the expressive power of MoEs in modeling complex tasks with two common structural priors: low-dimensionality and sparsity. For shallow MoEs, we prove that they can efficiently approximate functions supported on low-dimensional manifolds, overcoming the curse of dimensionality. For deep MoEs, we show that $\cO(L)$-layer MoEs with $E$ experts per layer can approximate piecewise functions comprising $E^L$ pieces with compositional sparsity, i.e., they can exhibit an exponential number of structured tasks. Our analysis reveals the roles of critical architectural components and hyperparameters in MoEs, including the gating mechanism, expert networks, the number of experts, and the number of layers, and offers natural suggestions for MoE variants.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
ARTEMIS: Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving
Feng, Renju, Xi, Ning, Chu, Duanfeng, Wang, Rukang, Deng, Zejian, Wang, Anzheng, Lu, Liping, Wang, Jinxiang, Huang, Yanjun
--This paper presents ARTEMIS, an end-to-end autonomous driving framework that combines autoregressive trajectory planning with Mixture-of-Experts (MoE). Traditional modular methods suffer from error propagation, while existing end-to-end models typically employ static one-shot inference paradigms that inadequately capture the dynamic changes of the environment. ARTEMIS takes a different method by generating trajectory waypoints sequentially, preserves critical temporal dependencies while dynamically routing scene-specific queries to specialized expert networks. It effectively relieves trajectory quality degradation issues encountered when guidance information is ambiguous, and overcomes the inherent representational limitations of singular network architectures when processing diverse driving scenarios. Additionally, we use a lightweight batch reallocation strategy that significantly improves the training speed of the Mixture-of-Experts model. Through experiments on the NA VSIM dataset, ARTEMIS exhibits superior competitive performance, achieving 87.0 PDMS and 83.1 EPDMS with ResNet-34 backbone, demonstrates state-of-the-art performance on multiple metrics. Code will be available under https://github. UTONOMOUS driving has experienced rapid development over the past few decades.
- Asia > China > Hubei Province > Wuhan (0.05)
- Asia > China > Hong Kong (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Robots (0.70)
- Information Technology > Artificial Intelligence > Vision (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification
Zhang, Jing, Yu, Xiaowei, Chen, Tong, Cao, Chao, Chen, Mingheng, Zhuang, Yan, Lyu, Yanjun, Zhang, Lu, Su, Li, Liu, Tianming, Zhu, Dajiang
The Lewy body dementia (LBD) is the second most common neurodegenerative dementia after Alzheimer's disease (AD). Early differentiation between AD and LBD is crucial because they require different treatment approaches, but this is challenging due to significant clinical overlap, heterogeneity, complex pathogenesis, and the rarity of LBD. While recent advances in artificial intelligence (AI) demonstrate powerful learning capabilities and offer new hope for accurate diagnosis, existing methods primary focus on designing "neurallevel networks". Our work represents a pioneering effort in modeling systemlevel artificial neural network called BrainNet-MoE for brain modeling and diagnosing. Inspired by the brain's hierarchical organization of bottom-up sensory integration and top-down control, we design a set of disease-specific expert groups to process brain sub-network under different condition, A disease gate mechanism guides the specialization of expert groups, while a transformer layer enables communication between all sub-networks, generating a comprehensive whole-brain representation for downstream disease classification. Experimental results show superior classification accuracy with interpretable insights into how brain sub-networks contribute to different neurodegenerative conditions. Keywords: Brain inspired AI, Mix of Experts, Dementia.
- North America > United States > Texas (0.14)
- North America > United States > Indiana (0.14)
- North America > United States > Georgia > Clarke County > Athens (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Dementia (0.92)